Cell Systems — Latest Matching Preprints

1

Impacts of batch effects on the performance of machine learning classifiers across multiple studies

Raab, P.; Johnson, W. E.; Piccolo, S. R.

2026-06-30 bioinformatics 10.64898/2026.06.24.734352 medRxiv

Top 0.1%

39.9%

Show abstract

Precision medicine relies on accurate and generalizable predictions for patients across the spectrum of human diversity. Because capturing biological heterogeneity requires large sample sizes, researchers must often aggregate data from several experimental batches or independent studies. This integration allows for greater statistical power and diversity than a single study could provide, while avoiding the costs of generating massive new -omics datasets. Predictive models trained on these aggregated data are theoretically better equipped to detect subtle patterns that generalize to new data. However, this potential is frequently undermined by "batch effects"--systematic technical artifacts that can bias model training to predict experimental batches and shadow meaningful biological conditions. Models trained on data with batch effects can exhibit substantially degraded performance when applied to data from new batches. Statistical adjustment methods can mitigate these artifacts while preserving biological signals. To ensure these adjustments actually facilitate generalization, we emphasize the use of external, independent cohorts for rigorous validation. This chapter examines how batch effects impact predictions and compares various adjustment methods.

2

Computational screen of promoter configurations that robustly sense transcription factor dynamics

Shoyer, T. C.; Di Ventura, B.

2026-05-20 systems biology 10.64898/2026.05.19.726340 medRxiv

Top 0.1%

39.1%

Show abstract

Transcription factors (TFs) respond to external stimuli with time-varying changes in activity or localization (TF dynamics), driving differential transcriptional programs. Previous studies indicated that TF dynamics can be decoded at the promoter level in eukaryotes, yet a systematic understanding of robust solutions is lacking. By computationally screening over 10,000 mathematical models of multi-state promoters with various forms of TF-mediated regulation, we identify robust configurations that selectively respond to sustained ("pulse filtering") or pulsatile ("pulse boosting") TF dynamics. Promoters that activate via intermediate states and have negatively regulated deactivation robustly perform pulse filtering. In contrast, robust pulse boosting is achieved by promoters with a TF-mediated refractory state that permits short activation and recovers between pulses. Bifunctional TFs that exert activator- and repressor-like regulation extend the design space for pulse boosting. These results reveal general principles by which promoters interpret TF dynamics and suggest strategies to engineer synthetic systems to exploit them. HighlightsO_LIComputational screen of over 10,000 promoter models identifies features that enable promoters to selectively respond to sustained ("pulse filtering") or pulsatile ("pulse boosting") transcription factor (TF) dynamics. C_LIO_LIPromoters that activate via intermediate states and have negatively regulated deactivation robustly perform pulse filtering. C_LIO_LIPromoters with TF-regulated refractoriness robustly perform pulse boosting. C_LIO_LIPromoters regulated by bifunctional TFs extend the design space for pulse boosting. C_LI

3

Breaking the Synthesis Barrier for AI-Designed DNA Libraries

Sussex, S.; Borevkovic, E.; Lohmann, F.; Chen, N.; Lüthi, E.; Reddy, S. T.; Krause, A.

2026-07-07 bioengineering 10.64898/2026.07.07.736931 medRxiv

Top 0.1%

38.3%

Show abstract

Designing DNA libraries is a key challenge from drug design to protein engineering and synthetic biology. Modern generative models offer opportunities to navigate the design space and propose specific sequences predicted to be effective in-silico. Designing deterministic libraries of specific sequences is however limited by the cost of DNA synthesis -- the synthesis barrier. In contrast, high-throughput multiplexed screening can measure the function of billions of biological sequences in parallel. Harnessing this technology requires the design of randomized libraries with specific design constraints to achieve low synthesis costs. In practice, such stochastic libraries are often chosen heuristically, sacrificing control for scale. Is there a way to bridge AI-based in-silico sequence design with high-throughput experimentation? In this work, we introduce Policy Gradients for Library Design (PGLD). PGLD uses a synthesis-aware parametrization of stochastic DNA libraries and optimizes them against a specified objective function. This allows for designing massive, controlled libraries without being limited by synthesis costs. We show how PGLD enables lab-in-the-loop design of multi-round high-throughput experiments, and large-scale in-vitro DNA sampling from generative models. Finally, we use PGLD to design a library of ~10^6 unique sequences which is synthesized at a cost of ~700 USD to explore the mutation space of a broadly neutralizing influenza antibody.

4

Perturbation Curve models continuous transcriptional response trajectories and improves prediction of genetic modulations

Zhong, Y.; wang, l.; Yang, G.; Yu, L.; Qi, X.; Jiang, H.

2026-06-19 bioinformatics 10.64898/2026.06.16.732192 medRxiv

Top 0.1%

33.8%

Show abstract

Single-cell CRISPR screens, Perturb-seq, have revolutionized functional genomics by revealing biological causality. However, although perturbation assignments are typically represented as discrete labels, the cell-level effective strength of perturbations is often continuous and diverse. Current analytical frameworks struggle to decouple the variability in perturbation strength from the diversity of downstream responses. Here, we present Perturbation Curve (PertCurve), a nonlinear, curve-based computational framework that models the trajectories of transcriptomic responses by explicitly incorporating diverse perturbation magnitudes and strengths. By ordering cells by perturbation strength, we demonstrate that PertCurve accurately recapitulates the response magnitudes and reveals the distinct modularity and asynchrony patterns of downstream gene behaviors. These patterns are categorized into archetypes, including proportional, sensitive, and threshold responses. By applying this framework across CRISPRi/a modalities, we identify universal response patterns in viral infection, apoptosis, and proliferation genes, and reveal previously overlooked context-specific regulatory features in cell differentiation. Finally, incorporating PertCurve into perturbation prediction models and evaluation metrics enhances predictive performance, delivering actionable insights for refining established models.

5

Harnessing AI to Build Virtual Cells

Cheng, X.; Li, P.; Guo, H.; Liang, Y.; Gong, J.; de Vazelhes, W.; Gou, C.; Xie, P.; Song, L.; Xing, E. P.

2026-04-30 bioinformatics 10.64898/2026.04.11.717183 medRxiv

Top 0.1%

32.8%

Show abstract

A virtual cell is a world model of a cell: a computational system that predicts, simulates and programs cellular processes across modalities and scales. An important path toward this goal is to model how genetic and chemical perturbations give rise to transcriptional responses, a core capability for disease understanding and drug discovery. However, current approaches remain expert-intensive, relying on iterative manual model design, training and debugging over months. Here we present VCHarness, an autonomous AI system that constructs perturbation-response models by combining an AI coding agent with multimodal biological foundation models. The system explores large spaces of architectures and training pipelines with minimal human intervention, iteratively generating, evaluating and refining candidate models. Across multiple perturbation-response benchmarks, VCHarness identifies architectures that outperform expert-designed approaches while reducing development time from months to days. It further uncovers non-obvious architectural patterns associated with improved performance, indicating that automated search can extend beyond conventional design strategies. These results suggest a shift from manually engineered models toward autonomous systems for constructing components of virtual cell world models, enabling scalable and data-driven exploration of cellular systems.

6

Coarse-graining metabolic networks via feature learning reveals cross-species growth laws

Zhu, A.; Ho, P.-Y.

2026-05-15 systems biology 10.64898/2026.05.13.725055 medRxiv

Top 0.1%

31.2%

Show abstract

Bacterial growth and the underlying metabolic networks are highly dissimilar across species, posing a fundamental challenge for bioengineering tasks involving diverse species. For a given species across nutrient environments, growth is regulated via proteome allocation, which gives rise to linear relationships between growth and the sizes of coarse-grained proteome sectors. However, whether and how coarse-grained growth predictors generalize across species remain unclear. Here, using genome-scale metabolic models, we discover a simple cross-species trend in which the monoculture growth of a species is proportional to the number of nutrients it utilizes, indicating that the latter is a regulatory feature that is conserved across species. By coarse-graining metabolic networks using feature learning, we identify novel proteome sectors whose sizes exhibit cross-species correlations with growth in wide-ranging experiments, suggesting that these sectors are also conserved regulatory features. We further show that the sectors enable a predictive encoding of proteome costs and growth benefits, thereby providing a potential explanation for how coarse-grained network features emerge to be simple determinants of growth across diverse metabolic networks.

7

DoFormer: Causal Transformer for Gene Perturbation

Karbalayghareh, A.; Paull, E.; Califano, A.

2026-05-04 bioinformatics 10.64898/2026.05.02.722054 medRxiv

Top 0.1%

31.1%

Show abstract

Learning causal gene regulatory mechanisms from single-cell data, and thereby predicting the effects of unseen perturbations, remains challenging. Observational RNA-seq data alone is insufficient for causal modeling, whereas perturbational data is essential. Classical causal inference methods often rely on unrealistic directed acyclic graph (DAG) assumptions and are not well suited to integrating multimodal data. Current transcriptomic foundation models also typically treat observational and perturbational data identically, limiting their ability to model perturbations. We present DoFormer, a causal multimodal Transformer that makes no DAG assumptions and leverages rich perturbational data to accurately predict previously unseen perturbations. DoFormer enables principled in silico perturbations by adapting the causal do-operator within the attention mechanism: the perturbed gene is set to the intervention value and prevented from attending to other genes, allowing the model to fully distinguish observational from interventional regimes. We train DoFormer using biologically informed loss functions and evaluate it with comprehensive perturbation prediction metrics. DoFormer substantially improves perturbation prediction relative to baseline and prior foundation models, underscoring the importance of intervention-aware architectures and biologically grounded objectives for causal modeling in single-cell genomics.

8

Task-adapted biological foundation models uncover perturbation-centric representations

Pareja-Lorente, E.; Aloy, P.

2026-07-05 bioinformatics 10.64898/2026.06.30.735584 medRxiv

Top 0.1%

23.1%

Show abstract

Foundation models have emerged as powerful tools for learning transferable representations of biological systems, yet their latent spaces are typically optimized to capture cellular state rather than the effects of perturbations. Here, we demonstrate that a biological foundation model can be repurposed to learn a fundamentally different representation by changing its learning objective. We finetuned scGPT, a transformer pre-trained on over 30 million single cell transcriptomes, on more than three million LINCS L1000 perturbation profiles using a supervised objective that predicts perturbation identity. This transformed the latent space into a perturbation centric representation that aligned transcriptional responses induced by the same chemical or genetic perturbation across heterogeneous experimental conditions. Finetuned embeddings substantially outperformed both gene expression profiles and the original pretrained model, recovering [85, 100%] of perturbations within the top 100 nearest neighbors and increasing perturbation classification accuracy from [10, 19%] to [25, 49%]. Remarkably, although the model was trained exclusively to recognize perturbation identity, the learned representation spontaneously captured orthogonal biological relationships never provided during training, including chemical similarity (AUROC up to 0.81), mechanisms of action (Hit@10 up to 100%), compound target relationships (AUROC up to 0.74), and functional relationships between genetic perturbations. The resulting embedding space enabled mechanism-of-action annotation of nearly 12,000 previously uncharacterized compounds, prioritization of target related chemical genetic associations, and contextualization of unseen perturbations and external transcriptomic datasets. Together, our results establish objective-driven adaptation as a general strategy for repurposing biological foundation models to learn reusable representations of complex biological phenomena.

9

Mechanisms Matter: Transportability of Cellular Perturbation Effects

Qi, S.-a.; Chapfuwa, P.

2026-05-12 bioinformatics 10.64898/2026.05.08.723625 medRxiv

Top 0.1%

22.9%

Show abstract

Predicting cellular responses to genetic or chemical perturbations across biological contexts is central to drug development and disease understanding. Despite increases in data and model scale, deep learning models have not consistently outperformed simple baselines. Leveraging causal transportability theory, we show that cross-context generalization is governed by shared causal mechanisms, not merely distributional similarity. To enable controlled evaluation, we develop a causal simulator that generates realistic semi-synthetic Perturb-seq datasets with tunable mechanistic divergence, providing benchmarks with known ground-truth causal structure. Further, we adapt the Vendi diversity score to the perturbation setting as a diagnostic for mode collapse, a failure mode invisible to standard per-perturbation metrics. Extensive experiments across four deep learning models and six simple baselines on semi-synthetic and real Perturb-seq datasets reveal a cross-context generalization gap: performance under cross-context splits drops substantially, often to simple baseline levels. Notably, even on synthetic data with fully specified causal structure, no model generalized across contexts with different causal mechanisms. These results underscore the need for cross-context evaluation, diversity-aware metrics, and mechanistically grounded inductive biases.

10

Minimal Computational Framework for Systematic Identification of Antimicrobial Targets

Hassan, S. A.

2026-05-28 bioinformatics 10.64898/2026.05.24.727537 medRxiv

Top 0.1%

22.5%

Show abstract

Systematic identification of antimicrobial targets remains a major challenge, as discovery still relies largely on empirical, resource-intensive approaches with limited efficiency. We present a method for identifying antimicrobial targets based on protein dynamics, enabling rational polypharmacology. The approach spans multiple biological scales, from taxa (genus and species) to biological networks, including network hubs and edges, their constituent proteins, protein binding sites, and their conformational states. It is grounded in the premise that coordinated intervention across multiple, optimally selected targets, using combinations of compounds at safe or submaximal doses, can achieve therapeutic effects while reducing toxicity and limiting mutational escape. A survey of known antimicrobials indicates that a small number of recurrent protein-level mechanisms account for most disruptions of microbial survival. We introduce metrics to detect these mechanisms across a pathogen proteome and describe a streamlined, modular workflow for target identification and prioritization that is optimized for ease of deployment and naturally interfaces with downstream applications such as molecular screening and de novo design.

11

Unbalanced Perturbation Dynamics For Cell Fate Design

Peng, Q.; Wang, Y.; Li, J.; Wang, X.; Xiao, Y.; Zhou, P.

2026-07-04 bioinformatics 10.64898/2026.06.30.735555 medRxiv

Top 0.1%

22.2%

Show abstract

Large-scale single-cell perturbation sequencing provides an unprecedented opportunity to construct virtual cells for the in silico simulation of cellular responses and the inverse design of optimal interventions. However, most perturbation-response models treat cellular responses primarily as mass-preserving shifts in transcriptomic state, whereas single-cell perturbation measurements are inherently unbalanced: the recovered endpoint population is shaped by technical sampling as well as biological perturbation-induced proliferation, apoptosis and selection. Here we introduce U-Pert, an unbalanced generative framework that learns condition- and context-dependent perturbation dynamics from unpaired single-cell snapshots. U-Pert jointly models transcriptomic state transitions and cell-number dynamics, enabling scalable and robust forward prediction of unseen perturbations and contexts, as well as inverse design to screen for desired genetic or pharmacological interventions that achieve user-defined transcriptomic or population-level outcomes. Across controlled simulations, genetic perturbation benchmarks, sciPlex3 drug responses and PBMC cytokine perturbations, U-Pert predicts unseen responses, captures both molecular and abundance changes, and performs inverse design for target gene-expression programs and cell-type compositions. These results show that cell abundance is an integral component of the perturbation phenotype, providing a mass-aware framework for virtual-cell modeling and perturbation cell fate design.

12

DrugPTM-Bench: A Large-Scale Dataset for Predictive Modeling of Drug-Induced Cell Type-Specific Protein Post-Translational Modifications

Badkul, A.; Mottaqi, M.; Xie, L.; Xie, L.

2026-04-30 systems biology 10.64898/2026.04.27.721113 medRxiv

Top 0.1%

21.9%

Show abstract

Protein post-translational modifications (PTMs), particularly phosphorylation, serve as the primary "molecular switches" that orchestrate cellular signaling and drug response. While PTM dysregulation is a hallmark of cancer and neurodegeneration, the lack of standardized, drug-perturbed datasets has hindered the development of predictive models capable of capturing context-dependent PTM responses. Effective predictive modeling must therefore integrate multidimensional data, including the specific drug, dosage, treatment duration, cellular background, and the modified site. However, existing PTM resources remain largely static and fail to capture drug-induced regulation across these critical dimensions. To address this gap, we present DrugPTM-Bench, a curated, large-scale benchmark derived from decryptM-derived dose-dependent PTM measurements, standardizing site-level drug response across 7 cancer cell lines, 27 drugs, and 11,167 proteins. Comprising 99.5% phosphorylation events, the dataset includes six time points, 16 dosage levels, and pEC50 potency values (half-maximal effective concentration). We formulate a classification task to identify upregulated, downregulated, or unchanged PTM sites (following a drug treatment), a critical step in deciphering drug Mechanism of Action (MoA) and target engagement. Our evaluation reveals that in protein-disjoint out-of-distribution (OOD) setting, baseline machine learning and deep learning models struggle to recover minority regulation classes, while standard rebalancing strategies improve recall only at the cost of precision and overall F1-score. These results indicate that current methods do not learn robust decision boundaries between regulated and unchanged PTM events. DrugPTM-Bench provides a phosphoproteomics benchmark for modeling drug-induced PTM regulation in imbalanced biological settings. Beyond classification, DrugPTM-Benchs retention of pEC50 values, drug perturbation profiles, and site-level sequence context enables additional predictive tasks including drug potency regression, mechanism-of-action prediction from PTM fingerprints, and drug-specific PTM site sensitivity ranking, establishing a multi-task benchmark for PTM-centric drug discovery. Ultimately, DrugPTM-Bench establishes a rigorous framework for developing robust, context-aware models to elucidate drug MoA and signaling dynamics.

13

Reliability-weighted target prioritization in CD4+ T-cell Perturb-seq: a generalizability-theory decomposition

Cheng, C.

2026-07-15 bioinformatics 10.64898/2026.07.13.738312 medRxiv

Top 0.1%

21.9%

Show abstract

Genome-scale Perturb-seq screens prioritize candidate targets by the strength of a perturbations transcriptional effect. Effect strength does not answer a prior measurement question: is the readout dependable? A large effect estimated from a single guide, a single donor, or a pseudobulk of few cells need not survive replication, and for target prioritization each false lead costs a validation experiment. We treat each perturbation effect as a measurement in a crossed Target x Guide x Donor x Condition design and apply generalizability theory (Brennan, 2001; Cronbach et al., 1972) to separate the dependable part of an effect from facet-specific idiosyncrasy. Guides and donors enter as random facets; condition enters as a fixed facet and is analyzed within its levels. For each target we report a dependability profile over the facets and a joint generalizability coefficient over the two random facets, and we re-rank targets by effect magnitude weighted by that coefficient. On the released screen (Zhu et al., 2025), removing the measurement-error floor estimated from the non-targeting controls raises the number of genes with a dependable target-signal share above .10 from 40 to 7,674. Analyzed within activation states, dependability recovers the T-cell-receptor signaling module as reliably measurable only in activated cells, without recourse to gene annotation. A design study indicates that reliability is limited by the number of guides rather than the number of donors, so a future screen should add guides. Every methodological decision was recorded and adversarially reviewed, and all results regenerate from the released summary statistics.

14

A community machine learning challenge to predict the effects of gene perturbations on T cell differentiation for cancer immunotherapy

Zhang, J.; Schwartz, M. A.; Mutaher, M.; Olajide, O.; Pritykin, Y.; Ashenberg, O.; Hacohen, N.; Uhler, C.

2026-05-22 bioinformatics 10.64898/2026.05.21.726863 medRxiv

Top 0.1%

21.8%

Show abstract

Perturbations of genes with functional importance in T cells could be used to change the distribution of CD8 T cell states to enhance anti-tumor functions for cancer immunotherapies. We launched a world-wide computational challenge to predict the effects of gene perturbations and to devise objective functions for prioritizing gene perturbations that lead to desired T-cell state distributions. We supported the challenge by generating a single-cell Perturb-seq dataset profiling the effect of knocking out 73 individual expert-defined genes in T cells transferred into a mouse melanoma model. We compared the top algorithms developed by participants, and found that performance was primarily determined by the prior data used for gene feature representation, with perturbational data derived features, proving most effective. Experimental validation of the top 61 genes nominated by the algorithms revealed that perturbation of Ndufv2 and Dimt1 reached the defined objective and biased T cell differentiation toward desired states.

15

ORBIT: Annotation-Aware Empirical Enrichment and Semantic Reranking for Interpretable Functional-Class Recovery

Kidder, B. L.

2026-07-07 bioinformatics 10.64898/2026.07.01.735870 medRxiv

Top 0.1%

21.8%

Show abstract

Gene-set interpretation workflows are widely used to summarize transcriptomic and proteomic experiments, yet standard enrichment tools often return long, redundant result tables that require substantial manual consolidation. We developed ORBIT (Ontology-Ranked Biological Interpretation Tool), an annotation-aware interpretation workflow that combines empirical enrichment, semantic reranking, and redundancy-aware representative-term selection to prioritize interpretable functional summaries from gene sets. We evaluated ORBIT on a curated tiered benchmark of human functional-class gene sets spanning clean reference sets, size-ladder variants, and mixed-difficulty cases. On the 45-set core benchmark, ORBIT semantic achieved higher expected-class recovery than Enrichr and PANTHER Gene Ontology molecular-function baselines, with a mean reciprocal rank of 0.916 and top-1 recovery of 0.889. Bootstrap confidence intervals and paired permutation testing supported the robustness of this advantage, and supplemental analyses extended the comparison to g:Profiler. In a GPCR mixed-function case study, ORBIT compressed redundant enriched terms into semantic representative neighborhoods, illustrating how long enrichment outputs can be converted into reviewable biological summaries. We then used ORBIT to interpret immune-cell identity, interferon-response biology, and breast-cancer subtype programs. ORBIT linked PBMC3K markers to cytotoxic, antigen-presentation, and innate-immune cell states; prioritized antiviral, cytokine-response, RNA-binding, and secreted-factor biology after IFNB stimulation; and separated TCGA-BRCA basal-like proliferative chromosome/cell-cycle programs from luminal transporter and receptor-associated biology while retaining gene-level support.

16

FlowBench: separating planning, fault recovery and interpretation in agentic bioinformatics

Kurjan, A.; Cribbs, A. P.

2026-06-16 bioinformatics 10.64898/2026.06.12.731844 medRxiv

Top 0.1%

19.1%

Show abstract

AO_SCPLOWBSTRACTC_SCPLOWAgentic large language model (LLM) systems are being deployed in bioinformatics faster than they are understood, and single-metric evaluations conflate capabilities that fail independently. We introduce FlowBench, a benchmark that decomposes agentic bioinformatics performance into planning, fault recovery, biological interpretation, and end-to-end output-fidelity. Existing systems achieve high plan completeness, but their closed, single-provider designs prevent attribution of performance to scaffolding versus the underlying model. We therefore built FlowAgent, a modular, provider-agnostic framework whose components can be selectively disabled and whose backbone model can be swapped across providers on a shared harness, and used it to evaluate 23 models from three main providers. Three findings emerge. First, generating a valid workflow plan from a named toolchain is largely solved, whereas inferring an appropriate toolchain from biological intent alone is uniformly difficult regardless of model tier, compressing all models into a narrow 44-57% pass-rate band. Second, ablation shows that the dependency-structured plan and a completeness-reflection step drive performance, while adding a same-context validator-driven retry makes structural quality worse. Third, fault recovery and data-grounded interpretation remain unsolved. Models frequently propose fixes that force a clean exit while leaving the underlying data invalid, and data-grounded interpretation lags internal-knowledge recall by a consistent margin. Safety does not emerge from capability, and reasoning-tier models were among the least reliable at recognising unrecoverable faults. Once planning saturates, agent architecture and refusal calibration, not model scale, are the productive frontier. Availability and implementationFlowAgent and FlowBench are available under a GPLv3 licence at https://github.com/EnteloBio/flowagent Contactadam@entelo.bio

17

The structural context of mutations in proteins predicts their effect on antibiotic resistance

Green, A. G.; Tasmin, M.; Vargas, R.; Farhat, M. R.

2026-06-29 bioinformatics 10.1101/2025.09.23.676583 medRxiv

Top 0.1%

18.9%

Show abstract

In Mycobacterium tuberculosis, a prevalent and deadly pathogen, resistance to antibiotics evolves primarily through non-synonymous mutations in proteins. Sequence-based analyses are currently used to understand the genetic basis of antibiotic resistance, either via genotype-phenotype association, or via signals of convergent evolution. These methods focus on primary sequence and often neglect other biological signals such as protein structural information. We hypothesize that integrating the structural context of mutations improves the prediction of effects on function and phenotype. We curate high confidence structural annotations for the M. tuberculosis proteome from 1,371 crystallography and 2,316 AlphaFold predictions, and combine the structures with mutations from over 31,000 clinical M. tuberculosis isolates. We demonstrate that mutations in proteins known to cause resistance are clustered in 3D space, even in proteins where inactivating mutations at any position are thought to cause resistance. We develop a statistic to search the M. tuberculosis proteome for signal of clustered mutations, finding over 450 proteins that display this signal, many of which have a known relationship with antibiotic resistance. We show that a supervised classifier trained on 3D distance to known resistance sites alone has an F1 score of 94.6% at classifying mutations as resistance-conferring across proteins. This work demonstrates that protein structure provides useful information for categorizing which variants may cause antibiotic resistance, even when the majority of structures are AI-predicted.

18

V3Cell: A Vision-Guided Virtual 3D Cell Framework for Phenotypic Modeling and Perturbation Prediction

Lu, Y.; Xun, D.; chenke, X.; Xiaobo, Z.; Zhigang, Z.; Pengyu, C.; Xiwen, Y.; Zhengzheng, Y.; Jiahua, R.; Huili, H.; Jianying, H.; Pengwei, H.

2026-06-24 bioinformatics 10.64898/2026.06.23.734130 medRxiv

Top 0.1%

18.9%

Show abstract

Predicting how organoids respond to chemical perturbations is central to disease modeling and drug discovery. Existing virtual cell models operate at the single-cell level, producing static endpoint predictions from destructive assays. This leaves a critical gap at the organoid scale, where biological identity is defined by tissue-level architecture and continuous developmental dynamics rather than single-cell features. Here we introduce V3Cell, a vision-guided framework that constructs in silico surrogates of organoids directly from non-invasive bright-field microscopy. A foreground-aware model constructs static virtual 3D cells across colon, stomach, and lung organoid lineages. These virtual 3D cells closely match real samples across distributional metrics, micro-texture, and lineage-specific morphometrics, with small effect sizes for most descriptors. A temporal module further predicts developmental fate from as few as six early-frame observations and models fate-conditioned spatiotemporal trajectories that closely recapitulate real perturbation responses. V3Cell requires no omics profiling or fluorescent labeling, establishing a non-invasive brightfield-based paradigm for organoid-scale perturbation prediction. Our code and data are publicly available at https://github.com/Laineyoulu/V3Cell.

19

Additive baselines furnish no evidence for epistasis learning by MULTI-evolve

Visani, G. M.; Verma, A.; DeWitt, W. S.

2026-04-24 bioinformatics 10.64898/2026.04.23.719915 medRxiv

Top 0.1%

18.8%

Show abstract

Recent work from Tran et al. (Science, 2026) introduced MULTI-evolve, a framework for protein engineering that combines single-mutant nomination via a protein language model (PLM) or a deep mutational scan (DMS), experimental single- and double-mutant characterization, and neural networks to engineer hyperactive multimutant proteins. The authors attribute the frameworks performance to "epistasis-aware modeling" and claim that their neural networks "learn the epistatic landscape" and "identify synergistic interactions" from limited double-mutant training data. Additive models, by definition, cannot represent epistasis, making them a natural null baseline for such claims. Here we show that MULTI-evolves multimutant predictions are almost perfectly correlated with an additive models across all three engineering applications (APEX, dCasRx, and HuABC2), such that the engineering of multimutants reduces to combining beneficial mutations with the largest additive effects--a standard protein engineering strategy for over four decades. We also find that MULTI-evolves neural networks do not outperform an additive model on held-out test set predictions, and do not even represent epistasis in their training data. Finally, we revisit a DMS benchmark finding presented as evidence of epistasis learning and show that the same pattern is expected even under a null additive model, due to an elementary statistical phenomenon; when we fit an additive model to the benchmark data, it reproduces the reported pattern. More broadly, our findings underscore the need to benchmark models for machine learning-guided directed evolution against additive null baselines before attributing performance to learned epistasis.

20

PHI-Reason: evidence-grounded species-level phage-host prediction from structured biological text profiles

Zhang, Y.-z.; Xu, L.; Imoto, S.

2026-06-12 bioinformatics 10.64898/2026.06.10.727770 medRxiv

Top 0.1%

18.7%

Show abstract

Phage-host interaction (PHI) prediction is a fundamental problem in microbiology. Existing approaches typically encode phage and host information as numerical representations derived from sequence similarity, protein content or reference databases, and use them to score candidate hosts or train prediction models. Although effective, this design obscures which biological evidence supports a prediction. Here, we present PHI-Reason, a species-level PHI prediction framework that reformulates host prediction as inference over structured biological text. PHI-Reason converts heterogeneous genomic, functional and contextual evidence into modular natural-language profiles, which a frozen large language model integrates at inference time for candidate-host ranking or pairwise PHI assessment. Across species-level benchmarks, PHI-Reason achieved competitive performance and complementary correct assignments relative to established methods. Its explicit profile design enables evidence perturbation and rationale-grounding analyses, making prediction support and hallucination risk operationally measurable. PHI-Reason provides an interpretable evidence-integration approach and demonstrates how large language models can support evidence-grounded PHI prediction.